Chapter 5 

How to Measure Animal Personality 
and Why Does It Matter? Integrating 
the Psychological and Biological Approaches 
to Animal Personality 


Sonja E. Koski 

5.1 Introduction: What Is Animal Personality? 

During the last few years individual differences in nonhuman animal (hereafter 
“animal”) behavior have been a subject of rapidly growing research interest 
(reviews in Reale et al. 2007; Sih and Bell 2008). This has met the much older 
research tradition of personality psychology, which includes human and, more 
recently, animal personality (Gosling 2001). Individual differences in behavior and 
their underlying psychology are now increasingly relevant research fields in several 
species of animals. 

Individuals in many species, from invertebrates to lizards, fish, birds, and mammals, 
differ in their behavior from each other. This variation is often temporally consistent, 
meaning that an individual’s general behavioral tendency stays similar over time 
(Sih et al. 2004b). Behavioral tendencies generalize to some extent across situations, 
so an individual shows limited plasticity in its responses (Sih et al. 2004a, b). Behavioral 
tendencies are heritable (van Oers et al. 2005), have significant fitness consequences 
(Smith and Blumstein 2008), and may be organized hierarchically so multiple 
traits correlate to form higher organizational levels (Reale et al. 2007; Sih and 
Bell 2008). Consistent behavioral variation can be named “personality,” following the 
human personality psychology research tradition. Also, other terms (e.g., tempera¬ 
ment, coping style, behavioral syndrome) have been applied to consistent interindi¬ 
vidual behavioral variation, each having its own particular connotation (Reale et al. 
2007; Sih and Bell 2008). In this chapter, however, I treat them as synonyms and 
define personality as consistent interindividual variation in behavior. With this defini¬ 
tion, I take no stand regarding the proximate-level mechanisms, including psycho¬ 
logical ones, underpinning behavior. 

Consistent variation in behavior is evolutionarily puzzling because natural selection 
would be expected to winnow out any variation that has fitness consequences, so 
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that over time the optimally adaptive level of a behavioral trait would be the prevailing 
phenotype in a population. For example, if bold individuals locate food faster than 
shy individuals, bolder individuals could be expected to have higher fitness and 
thus, in time, boldness should be favored over shyness. Consistency in behavior is 
also challenging to understand because it limits an individual’s flexibility to adjust 
its behavior to deal with a situation in an optimal way. Moreover, behavioral varia¬ 
tion often occurs in suites of correlated behaviors; for example, individuals that are 
relatively bolder are also relatively more aggressive (Sih and Bell 2008), evoking 
questions about why such covariation should exist. 

The booming research on animal personality applies various approaches and 
methodologies. There has been much debate over how to best assess animal person¬ 
alities (e.g., Gosling and Vazire 2002; Itoh 2002; Reale et al. 2007; Vazire et al. 
2007; Uher and Asendorpf 2008). In this chapter, two fields of animal personality 
research are discussed - psychological and biological - that drastically differ from 
each other in their approaches. However, despite the seemingly fundamental differ¬ 
ences, common ground can be found. Although their particular paradigms may 
differ, the aims of the two approaches are in the end similar: to map the depth and 
width of individual differences in behavior; to understand the structure of this varia¬ 
tion; to understand its evolutionary consequences and underlying mechanisms; and 
to facilitate predictions, which in turn are helpful in a broad range of applications. 
Indeed, several researchers have stressed the benefits of such integration (Gosling 
2001 ; Nettle 2006; Sih and Bell 2008; Uher 2008a; Brosnan et al. 2009). To achieve 
integration and synergy, we need to understand each other’s approaches and their 
conceptual and practical consequences. Therefore, I discuss some of the conceptual, 
methodological, and practical issues that have been put forward in the biological 
and psychological animal personality literature in recent years. I propose that with 
increased methodological care and clarity in reporting, the two fields can benefit 
one another. Thereafter, I highlight some areas of research in which common 
ground can be found and suggest prospects for future animal personality research. 


5.2 Two Approaches to Animal Personality 

Animal personality research is roughly dichotomized between the human-oriented 
personality psychological tradition and the animal-oriented biological tradition. 
The “psychological” and “biological” approaches to personality differ at conceptual, 
methodological, and practical levels. 

The psychological approach is adopted by comparative personality psycholo¬ 
gists who apply the well-established human personality theory and methodology to 
animal personality research. The aim is to understand similarities and differences 
in human and animal personality regarding the structure, underlying neuropsycho¬ 
logical mechanisms, and evolutionary history. In humans, personality is understood as 
a psychological construct that influences behavior and is organized in a hierarchical 
structure (Maltby et al. 2007; see also, e.g., Allport 1961; Fast and Funder 2008 for 
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various definitions). According to the widely accepted Five-Factor Model (FFM), 
human personality consists of five stable superordinate domains or constructs 
labeled Extraversion, Openness, Conscientiousness, Neuroticism, and Agreeableness, 
each of which includes a number of subordinate facets (Costa and McCrae 1992; 
McCrae and Costa 2008; but see Eysenck 1991, 1992; Ashton and Lee 2007 for 
alternative models). Personality dispositions are heritable (Bouchard and Loehlin 
2001) and associated with important life outcomes, such as subjective well-being, 
health, mortality, mating success, quality of social relationships, occupational per¬ 
formance, psychopathology, and the likelihood of serious injuries (Franken et al. 
1990; Neeleman et al. 2002; Nettle 2005; Ozer and Benet-Martinez 2006; Roberts 
et al. 2007). Although traditionally the evolutionary significance and underlying 
mechanisms of human personality have received limited attention (Nettle 2006, 
2008), advances have been made in recent years (e.g., personality genetics: Bouchard 
and Loehlin 2001; Ebstein 2006; Penke et al. 2007; brain substrates: Gardini et al. 
2009; evolutionary significance: Nettle 2005, 2006; Smith and Blumstein 2008). 

Some personality psychologists have become interested in comparative person¬ 
ality research and have described animal personality within the framework of 
human personality psychology (e.g., Gosling and Vazire 2002; Weiss et al. 2007; 
King et al. 2008). This has wide-ranging benefits for personality psychology 
research, such as clarifying the phylogenetic history of personality, as well as the 
effects of genetic dispositions, development, and environment that are challenging 
to tackle in research on humans alone (Gosling 2001; Gosling and Greybeal 2007). 
The concept of personality as a hierarchical psychological construct is extended to 
animal personality, with the expectation that animal personality exhibits a similar 
structural organization (e.g., King and Figueredo 1997). The framework of the FFM 
has been taken as the starting point, and the methods are adopted from those used 
in human personality research. 

In human personality research, the most common method of obtaining data is 
self-rating, in which people evaluate themselves on lists of descriptive terms 
(“items”). However, assessment by knowledgeable informants (e.g., peers, parents, 
teachers) is also accepted and widely used (Boyle et al. 2008); and self- and other- 
rating can also be used in conjunction (e.g., Fast and Funder 2008). Assessing animals 
by knowledgeable informants (e.g., animal care-takers) is considered as a logical 
continuation of these methods (King and Figueredo 1997; Gosling and Vazire 2002). 
Thus, people rate animals on questionnaires that list descriptor items, which can be 
adjectives or behavioral descriptions, such as “curious” or “subject often touches 
new objects at great length” (Uher and Asendorpf 2008). So long as certain 
criteria - most importantly interrater reliability and construct validity (see defini¬ 
tions below) - are fulfilled, the results are considered to reflect subjects’ personality 
traits (Gosling and Vazire 2002; Vazire et al. 2007). 

This work has revealed that personality of animals can successfully be character¬ 
ized within the FFM framework (Gosling and John 1999; Gosling 2001; Capitanio 
and Widaman 2005). Personality constructs of animals vary in their degree of simi¬ 
larity to the human FFM, from highly similar (e.g., neuroticism in orangutans) 
(Weiss et al. 2006) to very different (e.g., dominance in chimpanzees) (King and 
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Figueredo 1997). This work has also led to further studies on heritability and cross¬ 
population consistency of ape personality and on apes’ subjective well-being 
(Weiss et al. 2000, 2002, 2006, 2007, 2009). 

The biological approach, as practiced by behavioral biologists, aims at finding 
out the mechanisms underlying, and the evolutionary forces maintaining, variation 
in personality traits. This approach builds on the traditions of ethology, behavioral 
biology, and evolutionary and theoretical ecology. Therefore, it relies on quantifying 
outward behavior. Whereas individual differences in animal behavior have been 
recognized as long as people have systematically observed animals (e.g., Yerkes 
1939; Pavlov 1951; Stevenson-Hinde et al. 1980), in biological research variation 
was long considered the raw material for natural selection to act upon, rather than 
being adaptive in and of itself (e.g., Dali et al. 2004; Sih et al. 2004a, b; van Oers 
et al. 2005). 

Variation around the assumed optimal mean was thought of as “noise” (Wilson 
1998), and behavioral research aimed to find these optimal means for species, age, 
and sex categories. However, following increased interest in individual-based 
approaches and improved analytical methods, research has shown “noise” to be 
actively maintained by evolutionary processes (e.g., Dali et al. 2004; Wolf et al. 
2007; Garamszegi et al. 2008; McNamara et al. 2009). This realization has led to 
efforts to quantify individual variation in behavioral traits. Usually behavioral per¬ 
sonality research makes use of experimental testing, in which variation in a trait can 
be quantified by subjecting individuals to varying conditions or stimuli, such as a 
novel environment or a predator model (Reale et al. 2007). In addition, some bio¬ 
logical personality research uses behavioral observations in natural or captive cir¬ 
cumstances without experimental manipulation (Anestis 2005). Experimental and 
nonexperimental observations of behavior are coded to yield quantitative data on 
behavioral frequencies. This research has shown that in many species particular 
traits (e.g., aggressiveness, exploratory tendency, boldness, general activity) vary 
consistently among individuals (Sih et al. 2004b; Reale et al. 2007). Consistent 
variation is suggested to be maintained by frequency-dependent selection, muta¬ 
tion-selection balance, spatiotemporal variation in environmental conditions, and 
trade-offs between alternative strategies (Dali et al. 2004; Dingemanse et al. 2004; 
Wolf et al. 2007; Sih and Bell 2008; McNamara et al. 2009). 

Largely, research on the evolutionary mechanisms of personality is still in its 
infancy, and much more additional theoretical and empirical work is needed to 
clarify the observed patterns in variation and consistency. What is clear, however, 
is that these findings have brought consistent individual differences into the focus 
of behavioral research. Recognition of the relevance of an individual as an explana¬ 
tory level has large repercussions for studies on, for example, theoretical ecology, 
learning, cognition, mating behavior, and cooperation (Dali et al. 2004; Sih and 
Bell 2008; McNamara et al. 2009). Moreover, the concept of behavioral syndromes 
(i.e., consistently correlated behavioral traits) draws attention to limited plasticity 
in behavior, carry-over effects, and connections between behavioral traits that have 
traditionally not been studied together (Sih et al. 2004a, b; Sih and Bell 2008; Smith 
and Blumstein 2008). 
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The psychological and biological approaches have, at the outset, little in common. 
The gulf is further widened by the tradition of publishing in separate journals and 
attending different conferences. Consequently, the results are not easily comparable, 
and the advances in research in the respective fields often remain unrecognized by 
researchers taking different approaches. However, these approaches need not be so 
far apart. 


5.3 Concepts in Personality Research: A Matter of Definitions 
and Analytical Levels 

Conceptualization of personality in research is fundamentally a question of defini¬ 
tions and of the level of analysis. Nevertheless, it is by no means a trivial issue. 
Recently, Uher (2008a, b) highlighted it as one of the three critical issues of com¬ 
parative personality research: how to conceptualize, identify the domains of, and 
measure individual variation. 

The psychological and biological approaches to personality differ in their 
conceptualizations of personality. In the psychological tradition, the definition of 
personality allows multiple levels (psychological, situational, and behavioral) 
(Funder 2006). Conceptually, the trait hierarchy is emphasized, which demands 
knowledge of multiple traits and their relative extemalizations. Personality is seen 
as a complex, hierarchical structure of narrow trait dimensions nested within 
broader trait dimensions. On the other hand, personality defined biologically, as 
consistent interindividual variation at the level of behavior, tends to ignore psycho¬ 
logical influences and the hierarchical structure of traits (although the concept of 
behavioral syndromes is close to the concept of hierarchical structure in personality 
psychology) (Sih and Bell 2008; see also below). Although mechanisms are exam¬ 
ined, they are not part of the definition of personality. Fundamentally, however, the 
differences between psychological and biological conceptualizations are minor, as 
both agree that personality describes stable interindividual variability in traits 
within a population (Sih and Bell 2008; Uher 2008a). The necessary criteria, 
consistency within, and variation between individuals can be assessed at any trait 
organizational level, including psychological dispositions (see discussions on trait 
organization in, e.g., Reale et al. 2007; Uher 2008a). 

More problematic conceptual issues arise in comparative personality research, 
as species’ biology strongly determines their behavior; consequently, comparing 
particular behavioral traits may lead to a “comparing apples with oranges” problem 
(cf. Gosling 2001). For example, an antelope’s vigilance and easily triggered 
escape response may give an impression of a “shy” species if compared to a lion 
and a hornet’s tendency to attack may give the impression of an “aggressive” 
species if compared to a sheep. However, the antelope’s “shyness” and the hornet’s 
“aggressiveness” result from particular ecological selection pressures leading 
to species-typical behavior. Within these species, aggressiveness and shyness 
can be characterized as personality traits if they exhibit greater between- than 
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within-individual variation that is sufficiently stable in a population. How, then, are 
we to compare the aggressiveness of hornets and sheep? Uher (2008a, b) proposed 
that we borrow conceptualizations for comparative personality research from 
cross-cultural personality psychology. That is, we should aim at identifying popu¬ 
lation-specific personality traits, weak universal personality traits, and strong 
universal personality traits. Population-specific personality traits are specific to a 
given species and thus cannot be compared across species. Universal traits are 
those that show consistent variation across species and are therefore comparable; 
strong universals are those that show significant differences between species’ 
mean and variance of the trait distribution, and weak universals are those in which 
the mean and variance of the trait distribution are the same across species. When 
trait distributions differ among species, a mathematical standardization of the 
scores is necessary to allow comparisons. 

The shyness-boldness continuum has been proposed as a universal trait due to 
substantial evidence of its existence in several species, including humans (Beaton 
et al. 2008; Sih and Bell 2008), but little is known of its trait value means and dis¬ 
tributions across species. Uher’s (2008a) framework is applicable to comparative 
research (but see Realo and Allik 2008). However, the framework puts little 
emphasis on how differences arise as a consequence of proximate determinants and 
evolutionary selection pressures. Even if a trait, such as boldness or aggressiveness, 
exhibits interindividual variation in a broad range of species, the proximate mecha¬ 
nisms may be different in different species. Moreover, the mean and variance of the 
trait distribution are likely to be strongly affected by species’ ecology; conse¬ 
quently, particular traits are subjected to different selection pressures in different 
species. Therefore, in comparative research, identification of variation in a trait 
should ideally include determining proximate-level mechanisms and accounting for 
the species ecology and the selection pressures acting on the trait in the target species 
(Reale et al. 2007). 


5.4 Methodologies in Personality Research 

5.4.1 How Are Candidate Personality Traits Selected, Extracted, 
and Analyzed? 

The methodological core issues concern how to (1) identify and (2) measure the 
domains of behavioral variation (cf. Uher 2008b). Identification, or selection, of 
candidate personality traits refers to the a priori process of deciding which traits are 
sampled as potentially personality-relevant ones. First, the definition and hierarchical 
level of a candidate trait are to be decided. One problem lies in the fact that the 
biological and psychological definitions of the term “trait” differ: Biologists 
understand a trait to be any (quantifiable) characteristic (see discussion of the term 
“characteristic” by Wagner 2001), whereas psychologists apply the term to internal 
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dispositions that influence behavior (Larsen and Buss 2005). Throughout this 
chapter, the term trait is used with its biological meaning and is limited to behav¬ 
ioral characteristics, following my focus on behavioral variation. Clarifying the 
definition of trait would reduce misunderstandings between the behavioral and 
psychological personality literature (see recent discussion of confusion about 
terminology by Carere and Maestripieri 2008; Uher 2008a, b; van Oers 2008). 

The hierarchical level of the candidate trait is also relevant in the selection pro¬ 
cess. For example, maternal behavior is a composite trait consisting of nursing, 
carrying, protecting, grooming, and so on. These behaviors could be defined as 
individual traits or as measures of one composite trait. Each trait level is structurally 
connected to others below and above it (Reale et al. 2007). It depends on the ques¬ 
tion and the scale of the research on which organizational level the candidate traits 
are chosen. For example, if one is interested in behavioral syndromes, selection of 
candidates should include several lower-level traits that are examined for their 
interdependence, whereas if the goal is to identify fitness consequences of a par¬ 
ticular trait, starting from a higher hierarchical level may prove more useful. 

Uher (2008a) has summarized various approaches to selecting candidate traits. 
A “nomination approach” relies on the human ability to choose appropriate traits 
based on our perception of variation in animals, which allows a researcher to name 
the candidate traits. The better the species’ behavior is known, the more likely 
meaningfully varying behaviors are selected. An “adaptive approach” assesses 
the trait’s biological relevance in ecology or evolution of the species, so traits 
with the most significant fitness consequences in the past or present are assessed. 
A “top-down approach” takes personality traits found in other species and seeks 
similarities and differences in the target species. It can select a singular trait or base 
the selection on a broader hierarchical model of personality. Finally, a “bottom-up” 
approach starts from the study species and identifies candidates from its behavior 
or underlying mechanisms (see Uher 2008a, b for a thorough discussion). 

Although this is an insightful categorization of the candidate trait selection 
approaches, some of these approaches are in practice close to each other and used 
in combination. For example, exploratory tendency and aggressiveness have been 
found to influence individual’s fitness in great tits ( Parus major : Dingemanse et al. 
2004), and consequently other studies have examined variation in these traits in 
other species (e.g., mouse lemur Microcebus murinus: Dammhahn 2009; dog Canis 
familiaris : Svartberg et al. 2005; guppy Poecilia reticulata : Bums 2008; starling 
Sturnus vulgaris : Minderman et al. 2009; collared flycatcher Ficedula albicollis: 
Garamszegi et al. 2008). Selecting exploration tendency as a candidate trait in star¬ 
lings combines the adaptive approach and the top-down approach and is possibly 
also influenced by knowledge that such behavior is part of the species’ behavioral 
repertoire thus adding the nomination approach to the list. Also, Weiss and Adams 
(2008) have pointed out that whereas rating animals on an adjective descriptor item 
list (for which the items were derived from the human personality model) utilizes 
personality traits of another species by the top-down approach, the item descriptions 
are often adjusted to each species’ particular behavioral repertoire, thus combining 
the top-down, bottom-up, and nomination approaches. 
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I combine and simplify Uher’s (2008a) classification as mutually nonexclusive 
“know your species,” “traits relevant in other species,” and “compare to humans” 
approaches to candidate trait selection. The know your species approach relies on 
a broad knowledge base of the species’ behavioral repertoire, socioecology, life 
history, and evolutionary history. When species’ behavior has been sampled over 
years, in some cases decades, covering multiple individuals in multiple contexts 
and multiple populations, I consider it justified to select a range of naturally occur¬ 
ring behaviors as personality trait candidates. The benefit of this approach is that 
the selected candidate traits are likely to be ecologically relevant and part of the 
natural behavioral repertoire of the species. The drawback is a potential danger of 
selecting more easily accessible traits at the expense of rare or less conspicuous 
traits. The traits relevant in other species approach is an exploratory approach to test 
specific traits that have shown significance as personality traits in other species, 
either closely or more distantly related. The benefits of this approach are that it 
allows an assessment of the trait’s phylogenetic history and mapping of the trait’s 
generality across species (Gosling and Greybeal 2007). Also, it provides a time¬ 
saving shortcut to personality of a previously unstudied species. The most obvious 
drawback is that it may fail to account for traits relevant for the particular study 
species. Finally, the compare to humans approach seeks to identify those personality 
traits in animals that are known in humans, putting the focus on human-nonhuman 
similarities and differences. The drawback is that it may exclude traits that are 
absent in humans but biologically relevant for the target species (Uher 2008b; Uher 
and Asendorpf 2008; cf. Gosling and John 1999; Gosling 2001). 

Based on ecological relevance and generality across species (i.e., know your 
species and traits relevant in other species approaches), Reale et al. (2007) nomi¬ 
nated five categories for candidate personality traits in animals: shyness-boldness, 
exploration-avoidance, activity, aggressiveness, and sociability (see also Bell 
2007). This proposition does not specify the genetic or neurophysiological mecha¬ 
nisms nor on which structural level at which the trait should be measured. These 
trait categories are likely to be ecologically and evolutionarily relevant regardless 
of the species’ particular ecological conditions (Sih and Bell 2008). A possible 
drawback of keeping to these five categories is that it may limit research efforts to 
only these trait categories at the expense of other candidates. Moreover, in many 
species, some of these trait categories are correlated, suggesting structural disposi¬ 
tions among them (Sih and Bell 2008). Therefore, the five trait categories should 
not be presupposed to be independent. 

All the aforementioned approaches have their own pros, cons, and justifications 
depending on the particular goal of the research. A common drawback in the out¬ 
lined selection processes concerns the structural analysis. Understanding the struc¬ 
tural hierarchy of personality traits is relevant because it clarifies the connections 
between traits and directs us to the mechanisms underlying these connections (Reale 
et al. 2007; Sih and Bell 2008). However, all selection processes necessarily limit 
structural analysis to the assessed traits only and may neglect other traits that are 
potentially more biologically significant for a given species. The traits relevant in 
other species approach identifies variation often only in one or two candidate traits 
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(e.g., boldness or aggressiveness) and disregards the structural organization altogether. 
The know your species approach may nominate easily observable traits in the target 
species’ repertoire at the expense of less easily identified traits, again leading to a 
potentially incomplete structure. The compare to humans approach examines ani¬ 
mals for those traits that are relevant for the human personality model, potentially 
biasing the structure toward that of humans. Furthermore, we must acknowledge that 
none of the selection methods produces an all-inclusive list of personality traits 
in any given species. For example, if boldness is identified as a personality trait in 
great tits, we cannot argue that we have found all there is to be found in great 
tit personality. Similarly, if human personality traits are used as a template to identify 
personality in chimpanzees, and subsequently some or all of these traits are shown 
to exhibit consistent variation in chimpanzees, it does not mean that chimpanzee 
personality only includes variation in those traits. So long as researchers are aware 
of the drawbacks and state clearly the reasons for and methods of their selection of 
candidate traits, comparisons with other studies are possible. 

The second methodological concern is how the chosen traits are measured. The 
biological approach relies on coding expressed behavior. Biologists observe ani¬ 
mals and extract quantitative data from these observations. The situation may be 
experimentally induced, or a trait may be coded in nonmanipulated living condi¬ 
tions in the wild or captivity. Once the candidate traits are chosen, appropriate para¬ 
digms for measuring the target trait are designed, and standard observational 
techniques are applied. For example, to test exploration tendency, an animal is put 
into a novel environment and its response is measured by, for example, latency to 
visit a particular part of the novel environment (Verbeek et al. 1994). To test bold¬ 
ness-shyness, a novel object is introduced into the environment, and the subject’s 
latency to approach the object is recorded (Wilson et al. 1993). These tests are 
repeated over time for each subject. Data are analyzed for within- and between- 
subject variation and temporal consistency. In observations of nonexperimental 
situations, behavioral data of candidate traits are collected over a significantly lon¬ 
ger period of time and across several contexts. The most common methods involve 
focal and scan sampling (e.g., Capitanio 1999; Uher et al. 2008). Precautions must 
be taken to avoid bias to particular individuals or to variation in behavioral patterns 
due to circadian rhythm and care-taking events by randomizing the order of focal 
individuals and observing every animal repeatedly at different times of the day and 
in several contexts, all of which are standard practices in behavioral research 
(Martin and Bateson 1993). To ensure that the sample is representative of real 
behavior, a sufficient amount of data from each individual and a rigorous observa¬ 
tional technique are key criteria. Nevertheless, in nonexperimental settings, it may 
be difficult to obtain data for a particular trait as it occurs in nonsystematized con¬ 
texts and is likely to be influenced by other factors than those we are interested in 
studying (Reale et al. 2007). On the other hand, observing behavior as it occurs 
without experimental stimulation avoids the problem of artificial or ecologically 
irrelevant situations. 

The psychological approach relies on knowledgeable people rating the study 
subjects. The item lists consist of trait-relevant descriptors sometimes accompanied 
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by explanatory descriptions of the terms (e.g., Weiss et al. 2007). For example, 
the item “sympathetic” is given with an explanation “(s)ubject seems to be consid¬ 
erate and kind toward others as if sharing their feelings or trying to provide 
reassurance” (Weiss et al. 2006). Subjects’ item scores are analyzed for interrater 
reliability, which refers to agreement among raters in their assessment of a particu¬ 
lar individual. The data are then subjected to data reduction methods (usually factor 
analysis or principal components analysis), which yield information on multiple 
traits and their taxonomic structure. The rating method relies on people’s intuitive 
ability to mentally collate and hold information of an animal’s characteristics in 
meaningful categories (Gosling and Vazire 2002; Uher 2008a). The benefit of this 
method is its practicality, as large numbers of animals can be sampled in a short 
time. In addition, cross-situational consistency and reliability are argued to be supe¬ 
rior in the rating to that of the behavioral coding method because ratings incorpo¬ 
rate information over time and contexts, whereas behavioral codings are necessarily 
more limited for time and are vulnerable to the influence of context (Vazire et al. 
2007; see also below). However, the potential drawbacks of rating include unclear 
correspondence with behavior, unclear ecological relevance of rated items, and the 
ever-looming possibility of anthropomorphic projections (see below). 


5.4.2 Diagnostic Criteria: Validity, Reliability, Repeatability 

In Gosling’s (2001) comprehensive review on animal personality encompassing a 
broad range of species, more than 70% of animal personality research had relied on 
coding observed behavior. The rating method, however, was proportionately more 
often (42%) favored in studies on primates and domesticated animals. This reflects 
one of the key differences between the coding and rating methods: Rating animals 
is possible only when we can mentally represent an animal’s behavioral character¬ 
istics as impressions that translate to the descriptor items. This is likely to be easier 
when we find the behavior intuitively “understandable,” which is more probable in 
closely related species and species with which we have associated for a consider¬ 
able part of human evolution (Gosling et al. 2003; Weiss and Adams 2008; Uher 
2008b). It is thought that we hold less-clear representations of the behavior of ani¬ 
mals that are taxonomically more distant from us or otherwise less familiar, and 
thus rating these animals with descriptive items enhances the risk of interpretational 
errors. Coding behavior, in contrast, is possible with any organism so long as there 
is an a priori agreement of what exactly is being coded (i.e., the definition of the 
behavior is clear). 

The danger of anthropomorphizing animal behavior is present when we rate 
animals based on intuitive impressions and presumably more so when items are 
derived from human personality theory. The issue of anthropomorphism has been 
hotly debated in the animal personality literature, and it is beyond the scope of this 
chapter to summarize all of the arguments (e.g., Gosling and John 1999; Gosling 
and Vazire 2002; Itoh 2002; Reale et al. 2007; King et al. 2008; Uher 2008a, b). In 
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behavioral research, evaluating animals based on impressions of qualities labeled 
by human terms without quantified information of the corresponding behavior and 
its generality and frequency of occurrence is considered dubious. In contrast, the 
psychological research tradition puts little emphasis on behavioral frequency. 
Instead, the emphasis is in the psychometric properties of the emerging personality 
scores, which, when acceptable, are considered to reduce the likelihood of anthro¬ 
pomorphism. However, personality psychology is also well aware of the relevance 
of construct validity (i.e., the extent to which ratings reflect the corresponding real- 
life behavior and other corresponding and meaningful outcomes). Note that in this 
chapter the term “validity” is used as construct validity, and not face or concurrent 
validity, and convergent and discriminate validity is not separated - cf. Maltby et al. 
2007. Construct validation is necessary to ensure that rated personality traits are 
reflected in quantifiable differences within the species’ behavioral repertoire (e.g., 
Gosling 2001; Uher and Asendorpf 2008). Yet, thus far, behavioral validation of 
ratings is not the standard procedure (Gosling 2001). 

Animal personality studies that have assessed construct validity have generally 
reported high correspondence between ratings and observed behavior (Feaver et al. 
1986; Pederson et al. 2005; Konecna et al. 2008). In a study on rhesus macaques 
(Macaca mulatto ,), behavioral scores correlated moderately with the corresponding 
rated scores (Capitanio 1999). Moderate to high correspondence was also found in 
two studies on captive chimpanzees (Pan troglodytes) (Pederson et al. 2005; Vazire 
et al. 2007); but in another study (Uher et al. 2008) where behavioral data were 
obtained both experimentally and nonexperimentally, the correspondence between 
rated adjective-item data and coded behavioral data was low. Sometimes the 
obtained validity results are difficult to evaluate owing to biological or method¬ 
ological challenges. In a thorough study on wild-ranging male Hanuman langurs 
(.Semnopithecus entellus), ratings corresponded well to measured behavior, and the 
principal components analysis (PCA)-derived personality dimensions obtained by 
behavior coding and ratings agreed with each other (Konecna et al. 2008). However, 
coded and rated scores were influenced by male rank, which in langurs is labile 
(i.e., the rank position changes during the lifetime) (Borries 1997). This evokes a 
question regarding to what extent currently observed behavior is consequent of 
(unstable) rank position rather than personality. Furthermore, how do personality 
and rank position interact? As short-term studies are snapshots in time, the causal 
relations between rank position, behavior, and personality are difficult to assess. 
A methodological issue is the independence of rating and behavior coding; to truly 
test the construct validity of ratings, the behavioral data should be obtained inde¬ 
pendently (i.e., by different people), but this has not always been the case (e.g., 
Feaver et al. 1986; Capitanio and Widaman 2005; Konecna et al. 2008). Validations 
have also been done by rating observed behavior, rather than quantifying frequency, 
duration, and so on of the target behaviors (Gosling et al. 2003), and have been 
based on limited sampling efforts (Vazire et al. 2007). Finally, researchers may rely 
on the assumption that if someone has validated (some of) the personality traits in 
the same species (even if in a different population), it is unnecessary to perform 
behavioral coding (e.g., Weiss et al. 2002; King et al. 2008). Although cross-population 
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rating studies give converging results supporting the generality of personality (King 
et al. 2005; Weiss et al. 2007, 2009), the actual behavioral frequencies may well 
differ among populations. In sum, the importance of rigorous behavioral validation 
cannot be overemphasized. As a standard procedure, it would provide data with a 
common metric (i.e., occurrences and frequencies of strictly defined behavioral 
traits) that not only would facilitate comparisons across populations (and poten¬ 
tially across species), thus complementing data obtained by ratings, but also com¬ 
parisons with behavioral personality studies. By including quantified behavioral 
measures of rated personality, it is possible to assess directly the similarities with 
studies conducted by behavioral coding. 

In contrast to ratings, the problem of construct validity is less of an issue in 
behavioral coding as the actual behavior is quantified. Behavioral research is not 
free from validation challenges, as there is always an a priori expectation of the 
correspondence between the target personality trait (e.g., shyness) and the behavior 
(e.g., latency to approach an object), which may or may not be a correct assump¬ 
tion. To ensure construct validity in behavioral research, knowledge of the species’ 
behavioral repertoire and the functions of measured behaviors are imperative. 
Furthermore, some researchers (Vazire et al. 2007) have raised a concern that 
observers may interpret behavior wrongly (e.g., code submissive behavior as play), 
which would undermine the value of behavioral coding. Of course, as in any behav¬ 
ioral research, coders must be trained well to be familiar with a species’ behavior 
so they recognize the traits they code and record the observations reliably (see 
below for discussion on reliability). If knowledge of species’ behavior and func¬ 
tions of target traits, as well as coders’ sufficient training in recognizing and obtain¬ 
ing behavioral data are ensured, I consider construct validity of behavior coding to 
be, by default, high. 

Biologists stress the importance of another kind of validity, namely ecological 
validity (Reale et al. 2007; Burns 2008). That is, the test design and the behavioral 
measures should be ecologically relevant for the species. For example, a novel 
object that an animal reacts to is likely to be different for a bird than for a fish. 
Furthermore, response in a test should translate into responses in the corresponding 
real-life situation. In a recent study, mouse lemurs exhibited personality variation 
in standard novel object and open environment tests designed to assess variation in 
boldness and exploration tendencies (Dammhahn 2009). However, when the same 
animals were tested in a realistic situation posing varying degrees of risk - feeding 
on the ground (risky situation) versus feeding on a higher platform (safe situation) - 
the responses were not attributable to the measured personality differences. The 
author hypothesized that exploration tendency and boldness do not influence 
the survival component of fitness in this species. Alternatively, the test situation 
may not tap into personality differences exhibited in the real-life situation, thus 
illustrating the potentially low ecological relevance of the standard tests for this 
species. To ensure salience of the test situation, experimental setups should account 
for species’ ecology and natural behavior. In addition, traits ideally are assessed by 
several tests on the same trait (Bums 2008). In nonexperimental studies, ecological 
validity is vulnerable to the effects of context (Weiss and Adams 2008), which can 
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be overcome by sufficient sampling efforts. In rating studies, the ecological validity 
may be left implicit (Uher 2008a), especially for terms that have a less clear behav¬ 
ioral meaning. However, ecological validity can be ensured by showing that the 
rated items have an equivalent in naturally occurring behavior. 

Another key criterion is sufficient reliability (i.e., agreement in assessment 
between raters/coders) to minimize the chance of rater and coder biases. In rating 
studies, this is ensured by multiple raters whose assessments are tested for correla¬ 
tion. The items that do not meet the criterion are excluded from further analyses. 
Rating studies have shown remarkably high interrater reliabilities in various species, 
reflecting a high agreement between people on the animals’ characteristics (or, 
rather, between peoples’ impressions thereof) (Gosling 2001). However, interrater 
reliabilities are shown to be lower for items that have a behaviorally less clear 
connotation - such as eccentric, jealous, sensitive - compared to items that are 
behaviorally clearer, such as dominant, aggressive, and playful (Gosling 2001; 
Vazire et al. 2007; Dutton 2008). Reliability in rating thus seems to at least partly 
depend on how behaviorally clear the semantic meaning of the item descriptor is. 
Behavior coding studies have been noted for having low reliability or for not reporting 
reliability (Vazire et al. 2007). Indeed, reliability is often not tested or reported in 
behavioral personality studies, leaving unclear how many people coded the behavior 
and whether interobserver reliability was tested. This is in contrast to behavioral 
research outside of the personality realm, where testing for interobserver reliability 
is a standard procedure (observational research: e.g., Parr et al. 2005, Koski et al. 
2007; experimental research: e.g.. Call et al. 2005, Silk et al. 2005). The absence of 
reliability reporting in behavioral personality research is unfortunate. It remains 
crucial that interobserver reliability is habitually assessed in personality studies that 
rely on behavioral coding. 

To solve the debate between coders and raters as to which approach is better in 
animal personality research, Vazire et al. (2007) conducted a study using both 
methods. A total of 52 captive chimpanzees were rated on a 34-item list (a subset 
of item lists used in research on rhesus macaque and spotted hyena personality by 
Capitanio 1999 and Gosling 1998, respectively), as well as coded on a range of 
behaviors by one observer, obtaining 2-3 h of behavioral data per chimpanzee. The 
resulting scores of rated items and a selected subset of observed behaviors were 
correlated. The level of agreement between rated items and coded behavior per 
conceptually equivalent category varied greatly, from negligible to highly signifi¬ 
cant, implying that impressions on behavior and quantification of the corresponding 
behavior did not consistently match. Reliability of the rated data concerned, as is 
customary, the interrater correlation of the rated item scores. This was high, indicat¬ 
ing that people shared their impressions about the chimpanzees’ personality. In 
contrast, the reliability of the coded data was tested by treating each 15-min focal 
observation of a chimpanzee as an independent observation, which was correlated 
with other singular focal samples of the same chimpanzee (allowing calculation of 
the intraclass correlation coefficient). However, this procedure hardly represents 
good behavioral research practice for two reasons. First, a total of 2-3 h of observa¬ 
tion of a mammal with a complex behavioral repertoire is a very limited sample of 
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its general behavior patterns. Second, one focal observation cannot be considered 
as an independent and representative sample of an animal’s overall behavior. With 
small sample sizes, the risk of over- or underestimating behavioral parameters is 
considerably inflated (Martin and Bateson 1993). Moreover, testing intraclass 
correlations of singular focal samples is not an equivalent reliability test to inter¬ 
rater agreement in rating. To obtain a reliability measure comparable to the one of 
rated items, behavioral observations of several people obtained simultaneously 
should have been compared to each other. 

In sum, rating studies often suffer from a lack of independent and quantitative 
behavioral validation, which also leaves the ecological relevance of (some) rated 
items unclear. Coding studies have potential for a high construct and ecological 
validity, depending on the trait selection, experimental procedures, and sampling 
effort. Reliability among raters/observers is usually high in rating studies, whereas 
in coding studies it thus far is often left untested. 

Yet another diagnostic measure of a personality trait is repeatability or consis¬ 
tency over time. Behavioral consistencies have been analyzed in many ways (Hayes 
and Jenkins 1997). The current standard in behavioral research is to calculate a 
trait’s repeatability (i.e., an estimate of the variation within and between individuals). 
Repeatability is calculated with an analysis of variance (ANOVA), with individuals 
as a fixed factor and with a minimum of two measures of a trait for each individual 
(Lessells and Boag 1987; Bell et al. 2009). Behaviors that show low within- 
individual variation but high between-individual variation are more repeatable. In 
a recent meta-analysis, Bell et al. (2009) showed that across a large range of taxa and 
behaviors the average repeatability was significantly greater than 0, although there 
were species and sex differences. Individual differences accounted for roughly 37% 
of the variation. Also, psychological studies on animal personality have used various 
methods to test temporal consistency - for example, Cronbach’s alpha (Uher et al. 
2008) and intraclass correlation coefficients (e.g., King and Landau 2003), which 
is statistically identical to repeatability. Thus, the importance of behavioral consis¬ 
tency is agreed upon in both approaches. 


5.5 Finding Common Ground 

This chapter focuses on the differences between the psychological and biological 
approaches to animal personality. I have also stressed that it is possible to overcome 
the differences by appropriate methodological practices and improved clarity in 
reporting. Below are some aspects that I believe are of interest to both psychologi¬ 
cal and behavioral personality research and that would likely benefit from com¬ 
munication across disciplines. 

Noted earlier is the importance of hierarchical structure in the psychological 
personality research tradition, which is largely ignored in the behavioral research 
tradition. However, as behavioral syndromes are now in the forefront of behavioral 
research (cf. Sih and Bell 2008), they directly link with the question of structure of 
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personality traits. Which personality traits form syndromes and whether those 
syndromes are conceptually similar to constructs from the human personality theory 
are interesting avenues for future research. We also need to understand whether and 
when behavioral correlations are stable and which mechanisms underpin them. 
Understanding how multiple behavioral syndromes influence overall behavior and 
how they are dependent on each other is important in its own right (cf. Sih and Bell 
2008), but unraveling structural hierarchy of animal personality also allows direct 
comparisons with human personality structure. 

Animal and human personality research would make significant advances by 
addressing the four questions posed by Tinbergen (1963): causation, function, 
ontogeny, and evolutionary history (Bell 2007; Nettle 2008). Causation of behavior 
is about the proximate mechanisms underlying behavior. Both biological and psy¬ 
chological work has revealed a number of interesting mechanisms of personality. 
The genetic base of personality has been confirmed by establishing a significant 
heritability of numerous traits (Bouchard and Loehlin 2001). Direct connections 
between certain personality traits and their genetic and neurochemical correlates 
have also been proposed - for example, between a polymorphic dopamine receptor 
gene ( DRD4 ) and novelty-seeking behavior (Roussos et al. 2009; but see Klueger 
et al. 2002) and between a serotonin transporter gene ( 5-HTT) and anxiety-related 
traits (Ebstein 2006) in humans. Several other neuropeptides and hormones have 
been connected to personality, including testosterone (androgen receptor polymor¬ 
phism: Westberg et al. 2009; circulating testosterone: Rowe et al. 2004), vasopres¬ 
sin (Bartz and Hollander 2006), and cortisol (Hauner et al. 2008). Polymorphisms 
in the DRD4 and 5-HTT genes have also been identified in some animals, including 
nonhuman primates (Livak et al. 1995; Seaman et al. 2000; Bailey et al. 2007; 
Inoue-Murayama et al. 2008), and they may influence their personality traits much 
as in humans (Inoue-Murayama et al. 2006, 2008; Bailey et al. 2007; Spinelli et al. 
2007; Inoue-Murayama 2009). Establishing the genetic and physiological bases of 
human personality is one of the key challenges in human psychology (cf. Penke 
et al. 2007), and animal models are an important part of this work. However, it is 
equally important to establish mechanisms of animal personality in their own right. 
Illuminating links between genetics, brain functions, behavioral endocrinology, and 
personality traits is important to advance our understanding of both human and 
animal personalities for fundamental and applied reasons. 

The fitness consequences of and the evolutionary mechanisms maintaining 
human personality are gaining attention (MacDonald 1995; Nettle 2005, 2006, 
2007; Penke et al. 2007). As empirical research on the costs and benefits of person¬ 
ality in humans is still limited, it can benefit from the active research on these ques¬ 
tions in animal personality research. Identifying fitness consequences of personality 
traits is one of the main goals of biological personality research. It has been shown 
that many heritable personality traits have significant fitness consequences in terms 
of reproductive output and survival (Dingemanse and Reale 2005; Smith and 
Blumstein 2008). This has evoked a new set of questions about the evolutionary sig¬ 
nificance of personality. For example, most animal studies have addressed the fitness 
effects of single traits but not of correlated traits (Smith and Blumstein 2008; but 


130 


S.E. Koski 


see Sih and Watters 2005). How the different organizational levels of traits influ¬ 
ence the overall behavior and how they influence individual’s fitness are interesting 
future questions. Also, although some aspects of the environment’s influence on 
fitness consequences of animal personality have been shown (e.g., spatiotemporal 
variation in resource availability) (Dingemanse et al. 2004), there is much to be 
done to unravel the role of environmental conditions - e.g., predation pressure, 
social conditions, fluctuations therein - in determining fitness effects of personality 
traits (Sih and Bell 2008). 

Studying personality in social species is an especially interesting direction of 
research as some have suggested that social environments promote consistency in 
behavior (Fishman et al. 2001; Dali et al. 2004; McNamara et al. 2004) and maintain 
interindividual variation in continuous behavioral traits through frequency-dependent 
selection (McNamara et al. 2009). Social environments also influence how personal¬ 
ity traits manifest. For example, in rainbow trout ( Oncorhynchus mykiss ) observations 
of another’s shyness made bold individuals shyer, whereas shy individuals became 
bolder (Frost et al. 2007); and among zebra finches ( Taeniopygia guttata ) individuals’ 
exploration tendency increased in the company of an exploratory individual (Schuett 
and Dali 2009). In addition to the effects of the social environment on the manifesta¬ 
tion of personality traits in general, sociability as a personality trait has been surpris¬ 
ingly little studied in animals. In the common lizard ( Lacerta vivipara ), individual 
variation in sociability has been shown to influence survival and reproductive success 
(Cote et al. 2008). Similar, but indirect, evidence comes from primates: Baboon 
(Papio cynocephalus) females’ social network size correlates positively with their 
fitness (Silk et al. 2003, 2009). Whether and how network size is dependent on 
personality in baboons is unknown. However, in young rhesus macaques, personality 
(i.e., activity and calmness) predicts the number of social relationships (Weinstein and 
Capitanio 2008). In chimpanzees (Anestis 2005) and vervet monkeys ( Chlorocebus 
aethiops) (Fairbanks et al. 2004), some personality traits (i.e., aggressiveness and 
reactivity in chimpanzees, impulsivity in vervets) predict male rank, thus influencing 
males’ fitness. Furthermore, differences in chimpanzee alpha males’ dominance 
“styles” regarding their social grooming patterns have been identified, likely 
reflecting differences in personality traits (Foster et al. 2009), but it is yet unknown 
whether they have consequences for their fitness. 

Humans, like most other primates, are a highly social species. In human person¬ 
ality, sociability is, like excitement-seeking, a facet of extraversion (Costa and 
McCrae 1992), which is shown to predict sexual promiscuity (Schmitt 2004; Nettle 
2005) and social network size (Swickert et al. 2002). Furthermore, sociability has 
been shown to increase the likelihood of having children (Jokela et al. 2009). These 
findings indicate that sociability has evolutionary relevance in us as well. Therefore, 
the social environment, the particular personality traits it favors and constrains, and 
its influence on fitness are intriguing questions for human and animal personality 
research. 

Ontogeny of human personality has traditionally been the realm of developmen¬ 
tal psychology. Personality during early years is often referred to as temperament 
(McAdams and Olson 2010). The continuity from temperament to the five 
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personality constructs has been scarcely studied, although some studies have pro¬ 
posed a developmental scheme from the early temperament dispositions to person¬ 
ality factors (reviewed by Caspi et al. 2005; Rothbart 2007; McAdams and Olson 
2010). Later in life, personality develops in predictable ways: mean trait levels of 
neuroticism decrease, agreeableness and conscientiousness increase, and openness 
first increases and then decreases during adult life (Roberts et al. 2006). Early 
ontogeny and later development of personality traits has barely been studied in 
animals (cf. Stamps 2003). King et al. (2008) described the development of person¬ 
ality constructs in chimpanzees as nearly identical to that of humans. In three- 
spined stickle-backs (Gasterosteus aculeatus), boldness and aggression were stable 
through individual development in one study population but not in another (Bell 
and Stamps 2004). For great tit nestlings, handling stress (i.e., fear response to 
being handled by a human) at the age of 14 days correlated with the response 6 
months later (Fucikova et al. 2009). Developmental aspects in animal personality 
deserve more research as the age-related changes in animal personality are poorly 
understood. Moreover, they can illuminate the effects of gene-environment interac¬ 
tions on personality (cf. Caspi et al. 2005; Roberts et al. 2007). 

Finally, research into the evolutionary history of personality traits will benefit 
the most from comparative personality research and thus from an integrative 
phylogenetic framework (Gosling and Greybeal 2007). The process of identifying 
similarities and differences in personality across the animal kingdom, including 
humans, has only just started. Some personality traits, such as exploration tendency 
and boldness, appear to be important for a whole host of species and analogous (or 
homologous) with human personality traits within the constructs of openness and 
extraversion, respectively (Gosling and John 1999; Beaton et al. 2008). However, 
we are far from understanding how personality traits have evolved in various taxa, 
which of the similarities are due to homology and which to convergence, how the 
differences can be explained, and how this relates to a range of other aspects, such 
as species’ life history, population dynamics, cognition, learning, and social struc¬ 
ture, to name but a few. Animal personality research is coming of age, and as it 
grows it will significantly affect our understanding of human and animal behavior. 
The first tentative steps toward a unified approach to animal personality have been 
set. Aligning the concepts in animal personality research by different approaches 
will greatly enhance the advances in this rapidly growing field of research. 
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